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Abstract — A real-time communication system witli two en- 
coders communicating witli a single receiver over separate noisy 
channels is considered. The two encoders make distinct partial 
observations of a Markov source. Each encoder must encode its 
observations into a sequence of discrete sjrmbols. The symbols are 
transmitted over noisy channels to a finite memory receiver that 
attempts to reconstruct some function of the state of the Markov 
source. Encoding and decoding must be done in real-time, that 
is, the distortion measure does not tolerate delays. Under the 
assumption that the encoders' observations are conditionally 
independent Markov chains given an unobserved time-invariant 
random variable, results on the structure of optimal real-time 
encoders and the receiver are obtained. It is shown that there 
exist finite-dimensional sufficient statistics for the encoders. The 
problem with noiseless channels and perfect memory at the 
receiver is then considered. A new methodology to find the 
structure of optimal real-time encoders is employed. A sufficient 
statistic with a time-invariant domain is found for this problem. 
This methodology exploits the presence of common information 
between the encoders and the receiver when communication is 
over noiseless channels. 



I. Introduction 

A large variety of decentralized systems require communi- 
cation between various devices or agents. In general, since 
such systems may have multiple senders and receivers of 
information, the models of point-to-point communication are 
not sufficient. Typically in decentralized systems, the purpose 
of communication is to achieve a higher system objective. 
Examples include networked control systems where the over- 
all objective of communication between various sensors and 
controllers is to control the plant in order to achieve a 
performance objective, or sensor networks where the goal of 
conmiunication between sensors and a fusion center may be to 
quickly estimate a physical variable or to track in real-time the 
evolution of a physical phenomenon. In such systems, agents 
(sensors, controllers etc.) have to make decisions that affect 
the overall system performance based only on information 
they currently have gathered from the environment or from 
other agents through the underlying communication system. 
The communication problem therefore should not only address 
what information can be made available to each agent but also 
when is this information available. Thus, the overall system 
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objectives may impose constraints on the time delay associated 
with communication. 

In the presence of strict delay constraints on information 
transmission, the communication problem becomes drastically 
different from the classical information-theoretic formulations. 
Information theory deals with encoding and decoding of 
long sequences which inevitably results in large undesirable 
delays. For systems with fixed (and typically small) delay 
requirements, the ideas of asymptotic typicality can not be 
used. Moreover, information-theoretic bounds on the trade-off 
between delay and reliabihty are only asymptotically tight and 
are of limited value for short sequences ([1]). Therefore, we 
believe that the development of a real-time communication 
theory can significantly contribute to our fundamental under- 
standing of the operation of decentralized systems. 

In this paper we address some issues in multi-terminal 
communication systems under the real-time constraint. Specif- 
ically, we look at problems with multiple senders/encoders 
communicating with a single receiver. We analyze systems 
with two encoders as in Figure 1, although our results gen- 
eralize to n encoders (n > 2) and a single receiver. The 
two encoders make distinct partial observations of a discrete- 
time Markov source. Each encoder must encode in real-time 
its observations into a sequence of discrete variables that are 
transmitted over separate noisy channels to a common receiver. 
The receiver must estimate, in real-time, a given function of 
the state of the Markov source. The main feature of this multi- 
terminal problem that distinguishes it from a point to point 
communication problem is the presence of coupling between 
the encoders (that is, each encoder must take into account 
what other encoder is doing). This coupling arises because 
of the following reasons : 1) The encoders' observations are 
correlated with each other. 2) The encoding problems are 
further coupled because the receiver wants to minimize a 
non-separable distortion metric. That is, the distortion metric 
cannot be simpUfied into two separate functions each one of 
which depends only on one encoder's observations. The nature 
of optimal strategies strongly depends on the nature and extent 
of the coupling between the encoders. 

Our model therefore involves real-time distributed coding of 
a pair of correlated observations that are to be transmitted over 
noisy chaimels. Information-theoretic results on asymptotically 
achievable rate-regions have been known for some distributed 
coding problems. The first available results on distributed 
coding of correlated memory less sources appear in [2] and [3]. 
Multiple access channels with arbitrarily correlated sources 
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were considered in [4]. In [5], the encoders make noisy 
observations of an i.i.d source. The authors in [5] characterize 
the achievable rates and distortions, and propose two specific 
distributed source coding techniques. Constructive methods for 
distributed source coding were presented in [6], [7] and [8]. 
In particular, [6] address lossless and nearly lossless source 
coding for the multiple access system, and [7] addresses zero- 
error distributed source coding. The CEO problem, where a 
number of encoders make conditionally independent observa- 
tions of an i.i.d source, was presented in [9]. The case where 
the number of encoders tends to infinity was investigated there. 
The quadratic Gaussian case of the CEO problem has been 
investigated in [10], [11] and [12]. Bounds on the achievable 
rate-regions for finitely many encoders were found in [13]. A 
lossy extension of the Slepian-Wolf problem was analyzed in 
[14]. Multi-terminal source coding for memoryless Gaussian 
sources was considered in [15]. 

In [16],[17],[18],[19],[20] and [21], distributed source cod- 
ing problems with the objective of reconstructing a function 
of the source are investigated. In [16], the authors consider 
distributed source coding of a pair of correlated Gaussian 
sources. The objective is to reconstruct a linear combination of 
the two sources. The authors discover an inner bound on the 
optimal rate-distortion region and provide a coding scheme 
that achieves a portion of this inner bound. The problem 
of distributed source coding to reconstruct a function of the 
sources losslessly was considered in [17]. An inner bound was 
obtained for the performance limit which was shown to be 
optimal if the sources are conditionally independent given the 
function. The case of lossless reconstruction of the modulo-2 
sum of two correlated binary sources was considered in [18]. 
These results were extended in [21] (see Problem 23 on page 
400) and [19]. An improved inner bound for the problem in 
[18] was provided in [20]. 

The real-time constraint of our problem differentiates it from 
the information-theoretic results mentioned above. Real-time 
communication problems for point-to-point systems have been 
studied using a decision-theoretic/stochastic control perspec- 
tive. In general, two types of results have been obtained for 
point to point systems. One type of results establish qualitative 
properties of optimal encoding and decoding strategies. The 
central idea here has been to consider the encoders and the 
decoders as control agents/decision-makers in a team trying 
to optimize a common objective of minimizing a distortion 
metric between the source and its estimates at the receiver 
Such sequential dynamic teams - where the agents sequen- 
tially make multiple decisions in time and may influence 
each other's information - involve the solution of non-convex 
functional optimization to find the best strategies for the agents 
([22], [23]). However, if the strategies of all but one of the 
agents are fixed, the resulting problem of optimizing a single 
agent's strategy can, in many cases, be posed in the framework 
of Markov decision theory. This approach can explain some of 
the structural results obtained in [24], [25], [26], [27], [28]. An- 
other class of results establish a decomposition of the problem 
of choosing a sequence of globally optimal encoding and de- 
coding functions. In the resulting decomposition, at each step, 
the optimization is over one encoding and decoding functions 



instead of a sequence of functions. This optimization, however, 
must be repeated for all realizations of an information state 
that captures the effect of past encoding/decoding functions 
([26],[27],[29],[30]). 

Point to point communication problems with the real-time 
or finite delay constraint were also investigated from an 
information- theoretic point of view. We refer the reader to [25] 
for a survey of the information-theoretic approaches for point- 
to-point systems with the real-time or finite delay constraint. 

Inspired by the decision-theoretic approach to real-time 
point-to-point systems, we look at our problem from a decen- 
tralized stochastic control/team-theoretic perspective with the 
encoders and the receiver as our control agents/decision mak- 
ers. We are primarily interested in discovering the structure of 
optimal real-time encoding and decoding functions. In other 
words, given all the observations available to an agent (i.e, an 
encoder or the receiver), what is a sufficient statistic to decide 
its action (i.e, the symbol to be transmitted in case of the 
encoders and the best estimate in case of the receiver)?. The 
structure of optimal real-time encoding and decoding strategies 
provides insights into their essential complexity (for example, 
the memory requirements at the encoders and the receiver 
for finite and infinite time horizon communication problems) 
as well as the effect of the coupling between the encoders 
mentioned earlier. 

A universal approach for discovering the structure of opti- 
mal real-time encoding/decoding strategies in a multi-terminal 
system with any general form of correlation between the 
encoders' observations has so far remained elusive. In this 
paper, we restrict ourselves to a simple model for the encoders' 
observations. For such a model (described in Section |lll]), we 
obtain results on the structure of optimal real-time encoding 
strategies when the receiver is assumed to a have a finite mem- 
ory. Our results reveal that for any time horizon, however large 
(or even infinite), there exists a finite dimensional sufficient 
statistic for the encoders. This implies that an encoder with a 
memory that can store a fixed finite number of real-numbers 
can perform as well as encoders with arbitrarily large memo- 
ries. Subsequently, we consider communication with noiseless 
channels and remove the assumption of having limited receiver 



memory. For this problem, the approach in Section III results 



in sufficient statistics for the encoders that belong to spaces 
which keep increasing with time. This is undesirable if one 
wishes to look at problems with large/infinite time-horizons. 
In order to obtain a sufficient statistic with time-invariant 
domain, we invent a new methodology for decentralized de- 
cision problems. This methodology highlights the importance 
of common information/ common knowledge (in the sense of 
[31]), in determining structural properties of decision makers 
in a team. In general, the resulting sufficient statistic belongs 
to an infinite dimensional space. However, we present special 
cases where a finite dimensional representation is possible. 
Moreover, we believe that the infinite dimensional sufficient 
statistic may be intelligently approximated to obtain real-time 
finite-memory encoding strategies whose performance is close 
to optimal. 

The rest of the paper is organized as follows: In Section [ill 
we present a real-time multi-terminal communication system 
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and formulate the optimization problem. In Section |lll] we 
present our assumptions on the nature of the source and the 
receiver and obtain structural results for optimal real-time 



encoding and decoding strategies. In Section IV we consider 
the problem with noiseless channels and perfect receiver 
memory. We develop a new methodology to find structural 
results for optimal real-time encoders for this case. We look 
at some extensions and special cases of our results in Section 
rv] We conclude in Section IVTl 

Notation: 1. Throughout this paper, subscripts of the 
form 1 : t, like Xi-t, are used to denote sequences like 

2. We denote random variables with capital letters {X) and 
their realization with small letters (x). For random vectors, we 
add a tilde ("■) over the vector to denote its realization. 

3. For continuous random-variables (or vectors), P{X = x) 
refers to P{x < X < x + dx). 

4. For a set A, we use A(^) to denote the space of probability 
densities (or probability mass functions) on A. 

II. A Real-Time Multi-terminal Communication 
Problem 

Consider the real-time communication system shown in 
Figure 1. We have two encoders that partially observe a 
Markov source and communicate it to a single receiver over 
separate noisy channels. The receiver may be interested in 
estimating the state of the Markov source or some function of 
the state of the source. We wish to find sufficient statistics for 
the encoders and the receiver and/or quahtative properties for 
the encoding and decoding functions. Below, we elaborate on 
the model and the optimization problem. 
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Fig. I. A Multi-terminal Communication System 



A. Problem Formulation 

1 ) The Model: The state of the Markov source at time t is 
described as 

Xt^iXlXf) 

where XI e A"*, i — 1,2 and X^,X^ are finite spaces. The 
time-evolution of the source is given by the following equation 

Xt+i ^ Ft{Xt,Wt) (1) 

where Wt,t = 1,2,.. is a sequence of independent random 
variables that are independent of the initial state Xi. 



Two encoders make partial observations of the source. In 
particular, at time t, encoder 1 observes X} and encoder 2 
observes X^. The encoders have perfect memory, that is, they 
remember aU their past observations and actions. At each time 
t, encoder 1 sends a symbol belonging to a finite alphabet 
to the receiver The encoders operate in real-time, that is, 
each encoder can select the symbol to be sent at time t, based 
only on the information available to it till that time. That is, 
the encoding rule at time t must be of the form: 



(2) 



where Xl.f represents the sequence Xl,X2,-.-,Xl and 
Z\.^_i represents the sequence Z\, Z\, . . . , Z\_^. In general, 
one can allow randomized encoding rules instead of deter- 
ministic encoding functions. That is, for each realization of 
its observations till time i, encoder 1 selects a probability 
distribution on and then transmits a random symbol gener- 
ated according to the selected distribution. We will show later 
that, under our assumptions on the model, such randomized 
encoding rules cannot provide any performance gain and we 
can restrict our attention to deterministic encoding functions. 
Encoder 2 operates in a similar fashion as encoder 1. Thus, 
encoding rules of encoder 2 are functions of the form: 



z2 = /2(x2„ZiVi) 



(3) 



where Zl belongs to finite alphabet Z^. 

The symbols Z\ and Z^ are transmitted over separate noisy 
channels to a single receiver. The channel noises at time t are 
mutually independent random variables X} and belonging 
to finite alphabets and M"^ respectively. The noise variables 
{Nl , Nl ,Nl,Nl,...,Nl,N'^,...) form a collection of inde- 
pendent random variables that are independent of the source 
process Xt,t— 1,2,.... 

The receiver receives Y^^ and which belong to finite 
alphabets and respectively. The received symbols are 
noisy versions of the transmitted symbols according to known 
channel functions h] and hf, that is. 



Y^^hliZlNi) 



(4) 



for i = 1,2. 

At each time t, the receiver produces an estimate of the 
source Xt based on the symbols received till time t, i.e.. 



X,, ^ g,{Y,\,,Yl^,) 



(5) 



A non-negative distortion function pt{Xt,Xt) measures the 
instantaneous distortion between the source and the estimate 
at time t. (Note that the distortion function may take into 
account that the receiver only needs to estimate a function of 
X} and X^ ) 

2) The Optimization Problem P: Given the source and noise 
statistics, the encoding alphabets, the channel functions hl,hf, 
the distortion functions pt and a time horizon T, the objective 
is to find globally optimal encoding and decoding functions 
fi T'fi T'di-T SO as to minimize 




Pt{XuXt) 



(6) 



4 



where the expectation in (|6]l is over the joint distribution 
of Xi-T and Xi;t which is determined by the given source 
and noise statistics and the choice of encoding and decoding 
functions fl,TJl.T^9i-T- 

We refer to the collection of functions fl.j. as encoder i's 
strategy (i = 1,2). The collection of functions gi-x is the 
decoding strategy. 

Remarks: 1. Since we consider only finite alphabets for the 
source, the encoded symbols, the channel noise, the received 
symbols and a finite time horizon, the number of possible 
choices of encoding and decoding functions is finite. There- 
fore, an optimal choice of strategies (/i.y, 5i:t) always 
exists. 

2. A brute force search method to find the optimal can always 
be used in principle. It is clear however that even for small 
time-horizons, the number of possible choices would be large 
enough to make such a search inefficient. Moreover, such a 
scheme would not be able to identify any characteristics of 
optimal encoding and decoding functions. 

The encoding functions and the decoding functions in 
equations ([3]) and (|5]l require the encoders and the re- 
ceiver to store entire sequences of their past observations and 
actions. For large time-horizons storing all past data becomes 
prohibitive. Therefore, one must decide what part of the 
information contained in these arbitrarily large sequences is 
sufficient for decision-making at the encoders and the receiver. 
In particular, we are interested in addressing the following 
questions: 

1) Is there a sufficient statistic for the encoders and the 
decoder that belongs to a time-invariant space? (Clearly, 
all the past data available at an agent is a sufficient 
statistic but it belongs to a space that keeps increasing 
with time.) If such a sufficient statistic exists, one can 
potentially look at problems with large (or infinite) time- 
horizons. 

2) Is there a finite-dimensional sufficient statistic for the 
encoders and the receiver? If such a sufficient statistic 
exists, then we can replace the requirement of storing 
arbitrarily long sequences of past observations/messages 
with storing a fixed finite number of real numbers at the 
encoders and the receiver. 

The above communication problem can be viewed as a 
sequential team problem where the encoders and the receiver 
are the decision-making agents that are sequentially making 
decisions to optimize a common objective. The communica- 
tion problem is a dynamic team problem since the encoders' 
decisions influence the information available to the receiver 
Dynamic team problems are known to be hard. For dynamic 
teams, a general answer to the questions on the existence of 
sufficient statistics that either have time-invariant domains or 
are finite-dimensional is not known. In the next section we will 
make simplifying assumptions on the nature of the source and 
the receiver and present sufficient statistics for the encoders. 

III. Problem PI 

We consider the optimization problem (Problem P) formu- 
lated in the previous section under the following assumptions 



on the source and the receiver. 

7. Assumption Al on the Source: We assume that the time- 
evolution of the source can be described by the following 
model: 

Xl^ = Fl{XlA,Wl) (7a) 

X^+, = F^{XlA,W^) (7b) 

where A is a random-variable taking values in the finite set A 
and W} ,t — 1,2, ... and ,t — 1,2... are two independent 
noise processes (that is, sequences of independent random 
variables) that are independent of the initial state {Xl^Xf 
and A) as well. Thus, the transition probabilities satisfy: 

P{Xl^^,Xj^,\XlxlA) 
=P{Xl^^XlA).P{X^,^^\XlA) (8) 

The initial state of the Markov source has known statistics that 
satisfy the following equation : 

P{Xl,XlA) =P{Xlxl\A).P{A) 

=P{Xl\A).P{Xl\A).P{A) (9) 

Thus, A is a time-invariant random variable that couples the 
evolution of X} and X'f. Note that conditioned on A, X} and 
X^ form two conditionally independent Markov chains. We 
define 

Xt:^{XlXlA) (10) 

which belongs to the space X := x X'^ x A. 
The encoders' model is same as before. Thus encoder 1 
observes X^ and encoder 2 observes Xf. Note that the random 
variable A is not observed by any encoder The encoders have 
perfect memories and the encoding functions are given by 
equations Q and Q. 

2. Assumption A2 on the Receiver: We have a finite memory 
receiver that maintains a separate memory for symbols re- 
ceived from each channel. This memory is updated as follows: 

Ml^l\{Yl),i^ 1,2 (11a) 

Ml = ll{Ml_,,Yl),i = 1,2 (lib) 

where Ml belongs to finite alphabet A^' , « 1,2 and l\ are the 
memory update functions at time t for i — 1,2. For notational 
convenience, we define Afg := for i = 1,2. The receiver 
produces an estimate of the source Xt based on its memory 
contents at time t—1 and the symbols received at time t, that 
is, 

Xt = gt{Y,\Y^,Ml^,Ml,) (12) 

We now formulate the following problem. 

Problem PI: With assumptions Al and A2 as above, 
and given source and noise statistics, the encoding alpha- 
bets, the channel functions hl,h^, the distortion functions 
pt and a time horizon T, the objective is to find globally 
optimal encoding, decoding and memory update functions 
/i T' /i T' 51:^7 ^i T' ^i T ^° minimize 

Jifl.TJlT,9l:Tjl.T,llT)='^i^Y.PtiXt,Xt) I (13) 
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where the expectation in ( [T3] l is over the joint distribution 
of Xi-T and Xi;t which is determined by the given source 
and noise statistics and the choice of encoding, decoding and 
memory update functions /i^.y, 5i:T, ^i T' ^i t- 
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Fig. 2. Problem PI 



A. Features of the Model 

We discuss situations that give rise to models similar to that 
of Problem PI. 

1. A Sensor Network: Consider a sensor network where 
the sensors' observations are influenced by a slowly varying 
global parameter and varying local phenomena. Our model 
is an approximation of this situation where A models the 
global parameter that is constant over the time-horizon T and 
XI are the local factors at the location of the i*'* sensor at 
time t. A finite memory assumption on the receiver may be 
justified in situations where the receiver is itself a node in the 
network and is coordinating the individual sensors. We will 
show that this assumption implies that the sensors (encoders 
in our model) themselves can operate on finite-dimensional 
sufficient statistics without losing any optimality with respect 
to sensors with perfect (infinite) memory. 

2. Decentralized Detection/Estimation Problem: Consider 
the following scenario of a decentralized detection problem; 
Sensors make noisy observations XI on the state A of envi- 
ronment. Sensors must encode their information in real-time 
and send it to a fusion center. Assuming that sensor noises 
are independent, we have that, conditioned on A, the sensor 
observations are independent. (Typically, the observations are 
also assumed to be i.i.d in time conditioned on the state of 
the environment, but we allow them to be Markov.) Thus, the 
encoding rule for the i*'' sensor must be of the form: 

zi^ n{xi„zi,_^ 

Consider the case where Zl can either be "blank" or a value 
from the set A. Each sensor is restricted to send only one non- 
blank message, and within a fixed time-horizon each sensor 
must send its final non-blank message. When a sensor sends 
a non-blank message Zl, the fusion center receives a noisy 
version of this message. As long as the fusion center 
does not receive final (non-blank) messages from all sensors, 
its decision is Xt — "no decision" and the system incurs a 
constant penalty c (for delaying the final decision on A). If 
all sensors have sent a non-blank message, the fusion center 
produces an estimate e ^ as its final estimate on A 



and incurs a distortion cost p{A,Xt). Thus, we can view the 
receiver as maintaining a separated memory for messages from 
each sensor which is initialized to "blank" and updated as 
follows: 



Ml, 



if Ml_i was "blank" 
otherwise 



(14) 



The receiver's decision is Xt = "no decision", if Y,* 



Mi 



"blank" for some sensor i, else the receiver uses 



a function gt to find an estimate 

Xt^gt{Yt\Yt\Ml,,Mt,) (15) 
The above detection problem therefore is a special case of our 



model with fixed memory update rules from ( 14 1. 

Clearly, our model also includes the case when the en- 
coders' observations are independent Markov chains (not just 
conditionally independent). In this case, the coupling between 
encoders is only due to the fact the receiver may be interested 
in estimating some function of the state of the two Markov 
chains and not their respective individual states. 

B. Structure Result for Encoding Functions 

We define the following probability mass functions (pmf) 
for encoder i, {i = 1,2): 

Definition 1: For t — 1,2, ... ,T and a ^ A, 

blia):^PiA = a\Xlt) 
Definition 2: For t = 2, 3, . . . , T and m e M\ 

filim) := P{MU=MZl..t-iJlt-i) 

where l\.t-i in the conditioning indicate that /ij is defined for a 
fixed choice of the memory update rules l\.t^i- For notational 
convenience, we also define for each m £ ~ 1,2, 

^J^,{m) 

Theorem 1: There exist globally optimal encoding rules of 
the form : 

zi = fl{xl,Ut,^J^ (16) 

where fl are deterministic functions for t = 1, 2, . . . , T and 
i = 1,2. 

Discussion: In contrast to equation (|2]), Theorem 1 says that 
an optimal encoder 1 only needs to use the current observation 
Xl and the probability mass functions h\ , fij that act as a 
compressed representation of the past observations Xl.t_i and 
Z\.t_i. These pmfs represent the encoder I's belief on A and 
Ml,. 

To obtain the result of Theorem 1 for the encoder 1, 
we fix arbitrary encoding rules for the encoder 2 of the 
form in ([3]), arbitrary memory update rules of the form in 
and arbitrary decoding rules of the form in 
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12 1. Given 



these functions, we consider the problem of selecting optimal 
encoding rules for encoder 1 . We identify a structural property 
of the optimal encoding rules of encoder 1 that is independent 
of the arbitrary choice of strategies for encoder 2 and the 
receiver We conclude that the identified structure of optimal 
rules of encoder 1 must also be true when encoder 2 and the 
receiver are using the globally optimal strategies. Hence, the 



6 



identified structure is true for globally optimal encoding rules 
of encoder 1. We now present this argument in detail. 

Consider arbitrary (but fixed) encoding rules for encoder 2 
of the of the form in ([3]), arbitrary memory update rules for the 
receiver of the form in ( [TT| and arbitrary decoding rules of the 
form in (12i. We will prove Theorem 1 using the following 
lemmas. 

Lemma 1: The belief of the first encoder about the random 
variable A can be updated as follows: 



h]^a\{hU,XlxU) 



(17) 



where a\,t — 2,2>, . . . ,T are deterministic functions. 

Proof: See Appendix A. ■ 
Lemma 2: The belief of the first encoder about the receiver 
memory M^Li can be updated as follows: 



where f3} ,t = 2,3, . . . ,T are deterministic functions. 
Proof: See Appendix B. 

Define the following random variables: 

Rl := {Xl,blf^l), 



(18) 



(19) 



for t = 1,2, . . . ,T. 

Observe that Rj is a function of encoder I's observations till 
time t, that is, Xl.^., Zl.^_i. Moreover, any encoding rule of 
the form in (|2| can also be written as 

— fti^l-.tT^l-.t-l) 



Lemma 3: R\,t — l,2,...,r is a perfectly observed con- 
trolled Markov process for encoder 1 with Z} as the control 
action at time t. 

Proof: Since R\ is a function of encoder I's observations 
till time t, that is, Xl.^, Zl.f_i, it is perfectly observed at 
encoder 1. 

Let xj.j, z\.^_i be a realization of the encoder I's observations 
X^.f., Zl.^_i. Similarly, let r} be a realization of Rl and 
and fll be realizations of and ii} respectively. Then, 

Pi^t+l — (^t+li ^t+l' At+l)kl:tJ ^l:t) 
= Pi'^-t+li ^t+li Nliti ^l:ti Ml:t: 

= -^(^t+l; l^^t+li ^l:ti Ml:t: ^l:t) 

XP{xl^,\xl,,bl,,f,l:t.4:t) (20) 
= -fl^t+l; Mt+ll^^t+l; I J At J ) 

xP{xl^,\xi,,bi,,f,i.uzl.t) (21) 

where the first term in pTj i is true because of Lemma 1 
and Lemma 2. Consider the second term in ( |2T| . It can be 



expressed as follows: 

P{^t+l\^l:tJ ^1:47 Al:t7 ^l:t) 
— ^^(^^t+li ^ = "kl:t: ^l:tJ Al:t7 ^l:t) 



(22) 



aeA 



^^'(a^t+il^ — zj.j) 



xP{A^a\xit,bl„fil„zlt) (23) 
= ^P(a;,Vi|A = a,xi).6i(a) (24) 

aeA 

where the first term in ( p4] i is true because of the Markov 
property of X^ when conditioned on A. Therefore, substituting 
(24]) in ([21}, we get 

^i^t+l — (^t+li ^t+li A*t+l)|2;i:t; ^i:t) 
=-PiH+lT P-t+l\^t+l: 7 HTl^t: ) 

X ^[P(a;,Vi|A = a,a;i)x6i(a)] (25) 

aeA 



The right hand side of (25 i depends only on xl,bl, (l\ and z} 
from the entire collection of conditioning variables in the left 
hand side of (|25ll. Hence, 



(26) 



P{R\+,\r\.i, ^P{R\^,\x\,^;b\,^,fi\..t.^\:t) 
=P{Rl+,\xlblfilzl) 
=P{R]^,\r],zl) 



This establishes the Lemma. ■ 
Lemma 4: The expected instantaneous distortion cost for 
encoder 1 can be expressed as : 



{ pt{Xt,Xt) I Xl^,Zi^ } = pt{R\, Zl) (27) 



where pt, t = 1, 2, . . . , T are deterministic functions. 

Proof: For any realization x\.^, z\.^ of Xl.^, Z\.^, we have 

-^{pt{x],XlA,gt{Yl,Y,^,Ml^,Ml^) \ x\,^,z\,,) } 

(28) 



E 



{ Pt{Xt 



Xt) 



The expectation in ( 28 1 depends on x\ (appearing in 



(29) 



the argument of pt) and the conditional probability: 
P{XlA,Y^\Y^\Ml_^,Mf_^\x\,,,z\,t). We can evaluate 
this conditional probability as follows: 

P{X^ ^xlA = aX^ ylY^ = yl 
Ml_^ = m\_^,M^_^ = v4_^\x\,,,zi^) 
=P{X^ = xl,Y^ = y\,Mt^ = m\_^\A : 
Ml_^ = m\_^,x\,^,z\.^)x 

P{Y^\A^aM-i = <-i.<t.A:t)y^ 
PiMl^i = ml_^\A^a,xit,zlt)x 
P{A^a\xlt,zl.J 
=P{X^ = xlY^ = ylMl, = m?_i |A = a)x 



(30) 



P(r/ = yl\z}) X P{Ml_, = m\_,\z\,,) x P{A : 

=P{X^ = xlX = ylMl, = mt,\A = a)x 
PiY,'^yl\zl)xfil{ml_,)xbl{a) 



(31) 
(32) 
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In the first term of ( (3T] l, we used the fact that conditioned 
on A, the observations of encoder 2 and received messages 
from the second channel are independent of the observations 
of encoder 1 and the messages received from the first channel. 
We used the fact that the noise variables are i.i.d and 
independent of the source in the second and third term of 
(|3Tli. Thus, the conditional probability in (|29]l depends only on 



p,l and 6j. Therefore, the expectation in (28 1 is a function 



of xj,z},p.l,bl. That is. 



E 



{pt{Xt,Xt) 



} Pt{xl , zl , jll ,bl) 



(33) 
(34) 



Proof: [Proof of Theorem 1 ] From Lemma 3 and Lemma 
4, we conclude that the optimization problem for encoder 1, 
when the strategies of encoder 2 and the receiver have been 
fixed, is equivalent to controlling the transition probabilities 
of the controlled Markov chain Rl through the choice of the 
control actions Z} (where Zl can be any function of R\.^ and 
Z\.^_-^) in order to minimize X]t=i ^ { Pti^h^t) }■ ^ 
well-known result of Markov decision theory ([32], Chapter 
6) that there is an optimal control law of the form: 



or equivalently. 



zl^flixlblt^l) 



Moreover, it also follows from Markov decision theory that 
allowing randomized control policies for encoder 1 cannot 
provide any performance gain. Since the above structure of the 
optimal choice of encoder 1 's strategy is true for any arbitrary 
choice of encoder 2's and the receiver's strategies, we conclude 
that the above structure of optimal encoder 1 is true when the 
encoder 2 and the receiver are using their globally optimal 
choices as well. Therefore, the above structure is true for 
globally optimal strategy of encoder 1 as well. This completes 
the proof of Theorem 1. Structural result for encoder 2 follows 
from the same arguments simply by interchanging the roles of 
encoder 1 and encoder 2. ■ 

C. Structural result for Decoding Functions 

We now present the structure of an optimal decoding 
strategy. Consider fixed encoding rules of the form in (j2]i and 
Q and fixed memory update rules of the form in ( [Tl| i. We 
define the following probability mass function for the receiver 

Definition 3: For x E X and t = 1,2, ... ,T, 

M^) ■■= PiXt = x\Y,\Y,\Ml„Ml„fl„fl„ 
where the functions f^.^, fl.^,ll.^,ll.^ in the conditioning in- 
dicate that tpt is defined for a fixed choice of encoding and 
memory update strategies. 

Let A{X) denote the set of probability mass functions on 
the finite set X. We define the following functions on A{X). 

Definition 4: For any -0 e A(A') and f = 1, 2, . . . , T, 



argmin tp{x)pt{x, s) 
sex 



With the above definitions, we can present the result on 
the structure of a globally optimal decoding rule. 

Theorem 2: For any fixed encoding rules of the form in (j2]) 
and (j3]l and memory update rules of the form in ( 11 1, there is 
an optimal decoding rule of the form 



(35) 



where the belief 4't is formed using the fixed encoding and 



xex 



memory update rules. In particular, equation (35 1 is true for a 
globally optimal receiver, when the fixed encoding rules and 
memory update rules are the globally optimal rules. 

Proof: In order to minimize the expected total accu- 
mulated distortion, the receiver must minimize the expected 
distortion at each time t. Clearly, the definitions of the function 
Tt and the belief V't imply that Tt{ipt) achieves the minimum 
expected distortion at time t (see [25]). ■ 

D. Discussion of the Result 

Theorem 1 identifies sufficient statistics for the encoders. In- 
stead of storing all past observations and transmitted messages, 
each encoder may store only the probability mass functions 
(pmf) on the finite sets A and A^* generated from past 
observations and transmitted messages. Thus we have finite- 
dimensional sufficient statistics for the encoders that belong 
to time-invariant spaces (the space of pmfs on A and Ai"^). 
Clearly, this amounts to storing a fixed number of real-numbers 
in the memory of each encoder instead of arbitrarily large 
sequences of past observations and past transmitted symbols. 
However, the encoders now have to incur an additional com- 
putational burden involved in updating their beliefs on A and 
the receiver memory. 

We would like to emphasize that the presence of a finite 
dimensional sufficient statistic that belong to time-invariant 
spaces is strongly dependent on the nature of the source and 
the receiver. Indeed, without the conditionally independent 
nature of the encoders' observations or the separated finite 
memories at the receiver, we have not been able to identify a 
sufficient statistic whose domain does not keep increasing with 
time. For example, if the finite memory receiver maintained a 
coupled memory which is updated as: 

Mt ^ lt{Mt-i,Yl ,Y^^) 

then one may conjecture that the encoder could use a belief 
on Mt-i as a sufficient representation of past transmitted 
symbols, analogous to ii\ in Theorem 1. However, such a 
statistic cannot be updated without remembering all past data, 
that is, an update equation analogous to Lemma 2 for ^\ 
does not hold. This implies that the Markov decision-theoretic 
arguments of Theorem 1 do not work for this case. 
In the case when encoders' observations have a more general 
correlation structure, a finite dimensional statistic like h\ that 
compresses all the past observations seems unlikely. It appears 
that in the absence of the assumptions mentioned above, the 
optimal encoders should remember all their past information. 

If the receiver has perfect memory, that is, it remembers 
all past messages received, {MJ:_i — Yl.f_-^,i — 1,2), 
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Theorem 1 implies /ij = P{Yi.f_i\Zl.f_i) as a part of the 
sufficient statistic for encoder i. Thus, Theorem 1 says that 
each encoder needs to store beliefs on the increasing space 
of all past observations at the receiver This sufficient statistic 
does not belong to a time-invariant space. In the next section, 
we will consider this problem with noiseless channels and 
show that for noiseless channels there is in fact a sufficient 
statistic that belongs to a time-invariant space. However, this 
sufficient statistic is no longer finite dimensional and for 
implementation purposes, one would have to come up with 
approximate representations of it. 

IV. Problem P2 

We now look at the Problem PI with noiseless channels. 
Firstly, we assume the same model for the nature of the source 
and the separated memories at the receiver as in Problem 
PI. The result of Theorem 1 holds with the belief on Mf_i 
replaced by the true value of MJ:_i. The presence of noiseless 
channels implies that encoder i and the receiver have some 
common information. That is, at time t they both know the 
state of Ml_i. The presence of common information among 
agents of a team allows for new ways of optimizing the 
team objective ([33]). In this section, we will show that the 
presence of common information allows us to explore the 
case when the receiver may have perfect memory. We will 
present a new methodology that exploits the presence of 
common information between the encoder and the receiver to 
find sufficient statistics for the encoders that belong to time- 
invariant spaces (spaces that do not keep growing with time). 

A. Problem Formulation 

1) The Model: We consider the same model as in PI with 
following two modifications: 

i. The channels are noiseless; thus the received symbol 
Yj' is same as the transmitted symbol Zl, for i — 1,2 
and i = 1,2, . . . ,r. 

ii. The receiver has perfect memory, that is, it remem- 
bers all the past received symbols. Thus, Ml_-^ — 
Zlt_^, for i = 1,2 and t = 2,3,. .. ,T. (See Fig. 3) 

2) The Optimization Problem, P2: Given the source statis- 
tics, the encoding alphabets, the time horizon T, the 
distortion functions pt, the objective is to find globally 
optimal encoding and decoding functions fl-^T Ii tt 

so as to minimize 



T 

J{fl.T,flT.9i:T) = E[Y,Pt{XuXt)] (36) 
t=i 



where the expectation in ( 36 1 is over the joint distribution 
of Xi-T and Xi-t which is determined by the given 
source statistics and the choice of encoding and decoding 
functions flrJi-.T^ai-.T- 

B. Structure of the Receiver 

Clearly, problem P2 is a special case of problem PI. The 
decoder structure of PI can now be restated for P2 as follows: 
For fixed encoding rules of the form in (j2]) and ([3]), we can 



Markov 
Source 



Encoder 1 



Encoder 2 



Receiver 
72 



Xt 



Fig. 3. Problem P2 



define the receiver's belief on the source as: 

^x) P{Xt = x\Zl,,,Zl,,jl,„ fl,) 
for X e A" and t = 1,2,. . . ,T. 

Theorem 3: For any fixed encoding rules of the form in ^ 
and ([3]l, there is an optimal decoding rule of the form 



Xt = Tti^Jt) 



(37) 



where the belief ijjt is formed using the fixed encoding rules 
and Tt is as defined in Definition 4. In particular, equation ( [37| ) 
is true for a globally optimal receiver, when the fixed encoding 
rules are globally optimal rules. 

C. Structural Result for Encoding Functions 

For a fixed realization of Z\.i_i, encoder z's belief on the 
receiver memory Ml_^ is simply: 

1 if m = z\.^^_^ 



otherwise 



(38) 



Therefore, using Theorem 1, we conclude that there is a 
globally optimal encoder of the form: 



for t = 1,2,. . . ,T and i 
Or equivalently. 



zi = fiixi,bi,p^ 
1,2. 



zi = fi{Xl,bl,Zl,,_,) 



(39) 



Observe that the domain of the encoding functions in (39 1 



keeps increasing with time since it includes all past transmitted 
symbols We would like to find a sufficient statistic that 

belongs to a time-invariant space. Such a statistic would allow 
us to address problems with large (or infinite) time horizons. 

For that matter, let us first review the approach used for 
obtaining the first structural result for the encoders (Theorem 
1). We fixed the strategy of encoder 2 and the receiver to any 
arbitrary choice and looked at the optimization problem PI 
from encoder I's perspective. Essentially, we addressed the 
following question: if encoder 2 and the receiver have fixed 
their strategies, how can we characterize the best strategy of 
encoder 1 in response to the other agents' fixed strategies? 
In other words, with ff.j, and gi-T as fixed, what kind of 
strategies of encoder 1 (fl-x) minimize the objective in equa- 
tion ([T3]l? This approach allowed us to formulate a Markov 
decision problem for encoder 1 . The Markov decision problem 
gave us a sufficient statistic for encoder 1 that holds for any 
choice of strategies of encoder 2 and the receiver and this led 
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to the result of Theorem 1 . In problem P2, such an approach 



gives the result of equation ( 39 1 - which implies a sufficient 
statistic whose domain keeps increasing with time. 

To proceed further, we need to adopt a different approach. 
As before, we will make an arbitrary choice of encoder 2's 
strategy of the form in ([3]). Given this fixed encoder 2, we will 
now ask, what are the jointly optimal strategies for encoder 1 
and the receiver? That is, assuming ff.j. is fixed, what choice 
of fl.rp and gi-T together minimize the objective in equation 
([36]l? From our previous structural results, we know that we 
can restrict to encoding rules fl.j, of the form in ( 39 1 and 
decoding rules from Theorem 3 without any loss of optimality. 
We thus have the following problem: 

Problem P2 ' : In Problem P2, with encoder 2's strategy fixed 
to an arbitrary choice /(^y, find the jointly optimal strategies 
of encoder 1 of the form in ( (39] l and of the receiver in Theorem 
3 to minimize 

T 

J{flTJ''T:9l:T) - EiY^MXuXt)] 

t=l 

Problem P2' is in some sense a real-time point-to-point 
communication problem with side information at the receiver. 
This is now a decentralized team problem with the first 
encoder and the receiver as the two agents. Note that encoder 1 
influences the decisions at the receiver not only by the symbols 
it sends but by the entire encoding functions it employs 
(since the receiver's belief ipt. depends on the choice of 
encoding functions /^.j). A general way to solve such dynamic 
team problems is to search through the space of all strategies 
to identify the best choice. For our problem (and for many 
team problems), this is not a useful approach for two reasons: 
1) Complexity - the space of all strategies is clearly too large 
even for small time horizons, thus making a brute force search 
prohibitive. 2) More importantly, such a method does not 
reveal any characteristic of the optimal strategies and does 
not lead to the identification of a sufficient statistic. We will 
therefore adopt a different philosophy to address our problem. 

Our approach is to first consider a modified version of 
problem P2' . We will construct this modified problem in such 
a way so as to ensure that: 

(a) The new problem is a single agent problem instead of a 
team problem. Single agent centralized problems (in certain 
cases) can be studied through the framework of Markov 
decision theory and dynamic programming. 

(b) The new problem is equivalent to the original team 
problem. We will show that the conclusions from the modified 
problem remain true for the problem P2' as well. 

We proceed as follows: 

Step 1: We introduce a centralized stochastic control problem 
from the point of view of a fictitious agent who knows the 
"common information" between encoder 1 and the receiver 
Step 2: We argue that the centralized problem of Step 1 
is equivalent to the original decentralized team problem. 
Step 3: We solve the centralized stochastic control problem 
by identifying an information state and employing dynamic 
programming arguments. The solution of this problem will 
reveal a sufficient statistic with a time-invariant domain for 
encoder 1. 



Coordinator 



Encoder 1 



Source 
X, = {Xi.XlA) 



Receiver 



Fig. 4. Coordinator's Problem P2" 



Below, we elaborate on these steps. 

Step 1: We observe that the first encoder and the receiver 
have some common information. At time t, they both know 
^i t-i- We now formulate a centralized problem from the 
perspective of a fictitious agent that knows just the common 
information We call this fictitious agent the "coordi- 

nator" (See Fig. 4). 

The system operates as follows in this new problem: Based 
on Z\.i_i, the coordinator selects a partial-encoding function 



An encoding function of the form in ( [39| can be thought of 
as a collection of mappings from A"^ x A(^) to - one for 
each realization of Z\.^_^. Clearly, w\ represents one such 
mapping corresponding to the true realization of Z\.^_^ that 
was observed by the coordinator (At i = 1, since there is 
no past common information, the partial-encoding rule w\\& 
simply /j which is a mapping from x A(^) to Z^.) 
The coordinator informs the encoder 1 of its choice w^. The 
encoder 1 then uses wl on its observations X\ and h\ to find 
the symbol to be transmitted, i.e. 



Z\^w\{X\,h\) 



(40) 



The coordinator also informs the receiver of the partial- 
encoding function. The receiver at each time i, forms its belief 
on the state of the source based on the received symbols, the 
partial-encoding functions and the fixed strategy of encoder 2. 
This belief is 

^i(x) ■.= P{X,=x\zl,,zl,,w\,,,ri,) 

for X G X. The receiver's optimal estimate at time t is then 
given as: 

Xt = argmin ^ i(;t{x)pt{x, s) (41) 



sex 



The coordinator then observes the symbol Z} sent from 
encoder 1 to the receiver and then selects the partial-encoding 
function for time t+1 (wl^i). The system continues like this 
from time i = 1 to T. The objective of the coordinator is to 



minimize the performance criterion of equation (36i, that is, 
to minimize 

T 

ElY^PtiXtJt)] 

t=i 
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We then have the following problem: 

Problem P2 " : In Problem P2, with encoder 2's strategy fixed 
to the same choice f'^j, as in P2' and with a coordinator 
between encoder 1 and the receiver as described above, find 
an optimal selection rule for the coordinator, that is find 
the mappings A(,i = l,2,...,r that map the coordinator's 
information to its decision 



SO as to minimize the total expected distortion over time T. 
(Note we have included the past decisions {w\.i_i) of the 
coordinator in the argument of At since they themselves are 
functions of the past observations Z\.t_-^). 
Remark: Under a given selection rule for the coordinator, the 
function w] is a random variable whose realization depends 
on the realization of past Zl_i which, in turn, depends on the 
realization of the source process and the past partial-encoding 
functions. 

Step 2: We now argue that the original team problem P2' 
is equivalent to the problem in the presence of the coordinator 
(Problem P2"). Specifically, we show that any achievable value 
of the objective (that is, the total expected distortion over time 
T) in problem P2' can also be achieved in problem P2" and 
vice versa. Consider first any selection rule At,t = 1,2, ...,T 
for the coordinator While introducing the coordinator in Step 
1, we gave it a particular structure- namely, we said that 
the coordinator only knows the common information between 
encoder 1 and the receiver. This is crucial because it implies 
that all information available at the coordinator is in fact 
available to both encoder 1 and the receiver Thus, the selection 
rule Af of the coordinator can be used by both encoder 1 and 
the receiver to determine the partial-encoding function, wl, to 
be used at time t even when the coordinator is not actually 
present! With encoder 2 fixed as before, the system operation 
for the model in Problem P2' can now be described as follows: 
At each time t, encoder 1, uses the selection rule At to decide 
the partial function to be used at time t, it then uses wl to 
evaluate as follows : 

Zl^wl{Xl,bl) 

The receiver uses the same selection rule to find out what wl is 
being used by encoder 1 . It then uses the received symbols to 
form its belief on the source and produce an estimate according 
to equation (41 1. Therefore, the coordinator can effectively 



be simulated by encoder 1 and the receiver, and hence any 
achievable value of the objective in Problem P2" with the 
coordinator can be achieved even in the absence of a physical 
coordinator. 

Conversely, in Problem P2' consider any strategy f^.j. of 
encoder 1 and the corresponding optimal receiver given by 
Theorem 3. Now consider the following selection rule for 
the coordinator in P2": At each time t, after having observed 
z\.i_i, the coordinator selects the following partial encoding 
function. 

Then it is clear that for every realization of the source, 
encoder 1 in Problem P2" will produce the same realization of 



encoded symbols as encoder 1 of Problem P2'. Consequently 
the above selection rule of the coordinator will induce the 
same joint distribution P{Xi.t, Zl.rp, Zf.rp) as the encoding 
rules fl.rp for encoder 1 in problem P2'. Then the receivers in 
Problem P2' and Problem P2" will have the same conditional 
belief ipt and will make the same estimates (given by Theorem 
3 and equation ( [4T] i respectively). Thus any achievable value of 
the objective in Problem P2' can also be achieved in Problem 
P2". 

The above equivalence allows us to focus on the coor- 
dinator's problem to solve the original problem P2'. We 
now argue that the coordinator's problem is in fact a single 
agent centralized problem for which Markov decision-theoretic 
results can be employed. 

Step 3: To further describe the coordinator's problem we 
need the following definition and lemma. 

Definition 5: For t — 1, 2, . . . , T, let be the coordinator's 
belief on , bj . That is, 

^lixlM) PiXl = = bl\Zlt,wit) 

for xl e and bj G A{A). 

For notational convenience, we define := 0. 

Lemma 5: For a fixed strategy of encoder 2, there is an 
optimal decoding rule of the form: 



Xt = n{^t)^n{5t{ilzlt)) 



(42) 



where St, t — 1,2, ... ,T are fixed transformations that depend 
only on source statistics and the fixed strategy of encoder 2 
and Tt, f = 1, 2, T are the decoding functions as defined 
in Definition 4. 

Proof: See Appendix C. ■ 



From equations (40i and (42i, it follows that in the coor- 
dinator's problem P2", encoder 1 and the receiver are simply 
implementors of fixed transformations. They do not make any 
decisions. Thus, in this formulation, the coordinator is the sole 
decision maker We now analyze the centralized problem for 
the coordinator. 

Firstly, observe that at time t, the coordinator knows its 
observations so far - Z\.t_i and the partial encoding functions 
it used so far - w\.t_i\ it then selects an "action" w] and 
makes the next "observation" Z}. In particular, note that the 
coordinator has perfect recall, that is, it remembers all its 
past observations and actions-this is a critical characteristic 
of classical centralized problems for which Markov decision- 
theoretic results hold. 

We can now prove the following lemma : 

Lemma 6: 1) With a fixed strategy of the second encoder, 
^t can be updated as follows: 



et^inet-i^ziw]) 



(43) 



7(,t — 2,...,T are fixed transformations that 
depend only on the source statistics. 
2) For a fixed strategy of the second encoder, the expected 
instantaneous cost from the coordinator's perspective can 
be written as: 



{ptiXt,Xt) I Zlt,<.t} = Pti^l) 



(44) 
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for t = 1, 2, . . . , r, where are deterministic functions. 
Proof: See Appendix D. ■ 

Based on Lemma 6, we obtain the following result on the 
coordinator's optimization problem. 

Theorem 4: For any given selection rule At,t — 1,2...,T 
for the coordinator, there exists another selection rule Gt,t — 
1, 2, T that selects the partial-encoding function to be used 
at time t, {w}) based only on and whose performance is 
no worse than that of Af,t — 1,2,...,T. Therefore, one can 
optimally restrict to selection rules for the coordinator of the 
form: 

wl = Gti^l,) (45) 
Proof: Because of Lemma 6, the optimization problem 
for the coordinator is to control the evolution of (given 
by (43 i) through its actions w]:, when the instantaneous cost 



depends only on . Since ^} is known to the coordinator, 
this problem is similar to the control of a perfectly observed 
Markov process. This observation essentially implies the result 
of the theorem, as it follows from Markov decision theory 
([32], Chapter 6) that to control a perfectly observed Markov 
process one can restrict attention to policies that depend only 
on the current state of the Markov process without any loss of 
optimality. A more detailed proof using the backward induc- 
tion method of dynamic programming is given in Appendix 
E. 

■ 

We have therefore identified the structure of the coordinator's 
selection rule. The coordinator does not need to remember all 
of its information - and ■wl.f_i. It can operate optimally 

by just using We can thus conclude the following result. 

Theorem 5: In Problem P2, there is no loss of optimality 
in considering decoding rules of the form in Theorem 3 with 
encoders that operate as follows: 
For i = 1, 2, define := and for t = 1, 2, ...T, 



and 



(46) 
(47) 



where 7^ are fixed transformations (Lemma 6). 

Proof: The assertion of the the theorem follows from 
Theorem 4 and the equivalence between problem P2' and P2" 
established in Step 2. The coordinator (either real or simulated 
by encoder 1 and receiver) can select the partial encoding 
functions by a selection rule of the form: 

and the encoder 1 's symbol to be transmitted at time t is given 
as: 

Thus, is a function of Xf,bl and that was used to 
select wl- That is, 

zl^fl{xl,bl^l,) 

where (•) = //(•, Ct_i)- The coordinator (real or simulated) 
then updates ^^^j^ according to Lemma 6 as: 

(l^^li^l,,Zl,wl) 



The same argument holds for encoder 2 as well. ■ 
D. Discussion 

Observe that Zl.f._i appearing in the argument of optimal 
encoding functions in (39i have been replaced by By 



definition, £^1 is a joint belief on and A{A), therefore, 
belongs to a time-invariant space, namely, the space of joint 
beliefs on A"^ and A{A). Thus the domain of the optimal 



encoding functions in (|46|l is time-invariant. However, 



above is a joint belief on a finitely-valued random variable 
(Xf) and a real-valued vector (bl). Thus, we have an infinite- 
dimensional sufficient statistic for the encoder Clearly, such 
a sufficient statistic can not be directly used for implemen- 
tation. However, it may still be used in identifying good 
approximations to optimal encoders. Below, we present some 
cases where the above structural result may suggest finite- 
dimensional representations of the sufficient statistic. 

E. Special Cases 

1) A observed at the Encoders: Consider the case when the 
encoder's observations at time t = 1 include the realization of 
the random variable A. Clearly, the encoder's belief on A, (bl) 
can be replaced by the true value of A in Theorem 5. Thus, 
for problem P2, there is an optimal encoding rule of the form: 

Zl = f}{XlA, P{Xl„A\Zl,_,Jl^_,)) (48) 

Since A belongs to a finite set, the domain of the encoding 
functions in (|48|l consists of the scalars X^ and A and a belief 



on the finite space x A. Thus when A is observed at the 
encoders, we have a finite dimensional sufficient statistic for 
each encoder 

2) Independent Observations at Encoders: Consider the 
case when the encoders' observations are independent Markov 
chains. This is essentially the case when A is constant with 
probability 1. Then, effectively, all agents know A. In this 
case, the result of (|48ll reduces to 

(49) 



zl ^ fl{xlP{xl,\zl,_,Jl,_,)) 



and we have a finite dimensional sufficient statistic for the 
encoders. 

3) Binary A: Consider the case when A can take only two 
values : or 1. Then the encoder's belief 6J can be described 
by a single real number, 

bl := PiA ^0\XI,) 

Thus, in Theorem 4 involves forming a joint belief on a 
finitely valued Xl_-^ and a real-number b'j._i e [0, 1]. Although 
probability distributions on real number can't be stored in 
the encoder's memory, we can still work with approximate 
versions of these beliefs. For example, we may decide to store 
the cumulative distribution function for only certain values in 
[0, 1] as an approximation of the true distribution. This would 
give a time-invariant finite dimensional approximation of the 
encoder's information. Approximate ways of evaluating and 
updating these approximate beliefs would be required for this 
scheme to become feasible. 
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V. Extensions 

We apply our results for Problems PI to P2 to other related 
problems in this section. 

A. Multiple (n) encoders and single receiver problem 




Fig. 5. Problem with n encoders 

Consider the model of Figure 5 where we have n (n > 2) 
encoders that partially observe a Markov source and encode 
their observations, in real-time, into sequences of discrete 
symbols that are transmitted over separate noisy channels 
(with independent noise) to a common receiver. We make 
assumptions analogous to assumptions Al and A2 for Problem 
PI, that is, 

1. Assumption 1: The state of the Markov source is given 
as : 

Xt := {Xl,Xf,...,Xf,A) 

where A is a time-invariant random variable and conditioned 
on A, Xl,Xf,...,Xf are conditionally independent Markov 
chains. The i*'' encoder observes the process XI, t = 1,2,... 
and uses encoding fimctions of the form : 



fl(xit,zi.. 



i) 



for i = 1,2, n. 

2. Assumption 2: We have a finite memory receiver that 
maintains a separate memory for symbols received from each 
channel. This memory is updated as follows: 



Mi = l{{Yl),i = 1,2,.. .,n 
Mi = l\{MU,Yi),i = 1,2,.. .,n 



(50a) 
(50b) 
are the 



where Ml belongs to finite alphabet A^', and 
memory update functions at time t for i = l,2,...,n. The 
receiver produces an estimate of the source Xt based on its 
memory contents at time t — 1 and the symbols received at 
time t, that is. 



Xt=gt{Y^\Y, 



2 

t ' ■ 



..,Y,",Ml„Ml„...,M^_,) 



(51) 



A non-negative distortion function pi{Xi,Xi) measures the 
instantaneous distortion at time t. We can now formulate the 
following problem. 

Problem P3: With the assumptions 1 and 2 as above, and 
given source and channel statistics, the encoding alphabets, 
the distortion functions pt and a time horizon T, the objective 
is to find globally optimal encoding, decoding and memory 



update functions flj,, flj,, f^.j,, gi-.r, li.r^ 'i:T> ^i-.t so 
as to minimize 

For this problem we can establish, by arguments similar to 
those used in the problems with two encoders, the following 
results (Theorems 6 and 7) that are analogous to Theorem 1 
and Theorem 5 respectively. 

Theorem 6: There exist globally optimal encoding rules of 
the form : 

Zl = fliXl,bl,l4) (53) 

where 6j := P{A\Xl,,) and pi := P{MU\Zl:t-i,ll:t-i)- 
The optimal decoding rules are of the form: 

Xt = TtiiJt) (54) 

where := P{Xt\Yt\Yt^ ...,Yt^ , M^, M^,, M^.,) 
and Tt is as defined in Definition 4. 

Proof: Consider any arbitrary choice of encoding func- 
tions for encoder 2 through encoder n and arbitrary choice 
of the decoding and memory update functions at the receiver. 
Then the problem for encoder 1 is essentially same as in the 
case when n = 2. ■ 
Theorem 7: Consider Problem P3 with noiseless channel 
(that is, Yf = ZD and perfect receiver memory (that is 
, Mt'_i = Zl.f_{). Then there is no loss of optimality 
in considering decoding rules of the form Xt = Ti{tpt) 
where ipt = P{Xt\Zl.f, Z^.j.) with encoders that operate 
as follows: 

For i = 1,2, ...,n, define := and for t = 1,2, ...T, 



and 



zi = fi{xi,bi,Ct-,) 

^=^l{^_,.ZUfi{;^_^) 



(55) 



(56) 



where 7^ are fixed transformations (Lemma 6). 

Proof: The result follows from Theorem 5 using similar 
argimients as in the proof of Theorem 6. ■ 



B. Point-to-Point Systems 



Source 




Encoder 






Receiver 











Xt 



Fig. 6. Side-Information Problem 



f 



1) A Side Information Channel: Consider Problem PI or 
P2 with encoder 2's strategy fixed as follows: 
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Then the multi-terminal communication problems reduce to a 
point-to-point communication problems with side-information 
available at the receiver (See, for example, Fig. 6). It is clear 
that the results of Theorem 1 and Theorem 2, for noisy 
channels, and Theorem 5, for noiseless channels, remain vaUd 
for these side-information problems as well (since they are 
true for any arbitrary choice of encoder 2's strategy). 

2) Unknown Transition Matrix: Consider a point-to-point 
communication system where an encoder is communicating 
its observations of a Markov source Xi to a receiver (Fig. 7). 
The channel may be noisy or noiseless, the receiver may have 
finite memory or perfect recall. Structural results for optimal 
real-time encoding rules have been obtained in cases when the 
transition probabihties of the Markov source are known ([24], 
[25]). Consider now the case where the encoder observes a 
Markov chain Xt whose transition matrix is not known. How- 
ever, the set of possible transition matrices is parameterized 
by a parameter A with a known prior distribution over a finite 
set A. The encoding functions are of the form: 

where is the transmitted symbol at time t. The receiver 
receives a noisy version of Zt given by 

Yt = ht{Zt,Nt) 

where is the noise in the channel. The receiver maintains 
a finite memory that is updated as follows: 

Ml = /i(Fi) 

Mt = lt{YuM^^i) 

where Mt & M., Vt. The receiver's estimate at time t is given 
as: 

Xt = gt{Yt,Mt_^) 

A non-negative distortion function pt{Xt,Xt) measures the 
instantaneous distortion at time t. We consider the following 



fit 



Markov Source 
with unknown 
statistics 




Fig. 7. Point-to-point system with unknown source statistics 
problem: 

Problem P4: Given the source and receiver model as above 
and the noise statistics, the encoding alphabets, the channel 
functions ht, the distortion functions pt and a time horizon T, 
the objective is to find globally optimal encoding, decoding 
and memory update functions fi-.T , gi-.T , h-.T so as to mini- 
mize 

J{fl:T,9l:T,ll:T)=1^l^Y.Pt{Xt,Xt) | (57) 

The methodology employed for the analysis of Problem PI 
can be used to establish the following result. 



Theorem 8: There exist globally optimal encoding rules of 
the form : 

Zt = MXt,bt,fit) (58) 

where h := P{A\Xi.,t) and nt := P{Mt-i\Z^:t-i,li:t-i). 
The optimal decoding rules are of the form: 

Xt = Tt(^t) (59) 

where ipt '■= P{Xt\Yt, Mt-\, fi-t, h-t) and rt is as defined in 
Definition 4. 

Proof: We can view the optimization problem P4 as a 
special case of Problem PI with an imaginary second encoder 
that makes no observations of the source and sends no message 
to the receiver (that is, the set and Z"^ are empty). Thus, 
the results of the above theorem follow from Theorem 1 and 
Theorem 2. ■ 
The methodology developed for the analysis of Problem P2 
can be used to obtain the following result. 

Theorem 9: Consider Problem P4 with noiseless channel 
(that is, Yt = Zt) and perfect receiver memory (that is , 
Mj_i = Zx-t-i). Then there is no loss of optimahty in 
considering encoding rules of the form: 

Zt = ft{Xt,bt,^t-i) 

where bt := P{A\Xi.,t) and 

6-1 := P{Xt-i,bt-i\Zi.,t-i) 

with decoding rules of the form: 

Xt = Tt(^t) (60) 

where ipt '■= P{Xt = x\Zi-t) and Tt is as de&aedm Definition 
4. 

Proof: The result follows from Theorem 5 using similar 
arguments as in the proof of Theorem 8. ■ 

C. kth order Markov Source 

Consider Problem PI or P2 with a source model given by 
the following equations: 

X, Vi = FliXlxU, .., X,Vi_„ A Wl) (61a) 

XI, = F^{Xl Xl„ .., Xl,_„ A, W^) (61b) 

Thus, conditioned on a global, time-invariant random variable 
A, Xl and X"^ are conditionally independent fcth order Markov 
processes. It is straightforward to consider a Markovian refor- 
mulation of the source by defining 

for i = 1,2 and t < k and 

for i = 1, 2 and t > k. We then have that 

Bl^, = Ft'{Bl,A,Wl) (62) 

for i = 1,2. Thus, we now have a Markov system (when 
conditioned on A) - with Bl as the encoder i's observations - 
for which our structural results directly apply. 
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D. Communication with Finite Delay 

Consider the models of Problem PI or P2 with the following 
objective function: 

T+d 
t=d+l 

The above objective can be interpreted as the total expected 
distortion incurred when the receiver can allow a small fi- 
nite delay, d, before making its final estimate on the state 
of the source. Thus, the receiver produces a sequence of 
source estimates Xd+i, Xd+2, Xr+d, and incurs a distor- 

T+d 

tion J2 E[pt{Xt-d, Xt)]. We can transform this problem 

t=d+l 

to our problem by the following regrouping of variables. 
For i = 1,2 and < = 1, 2, .., d define 

Bl:^ {XI, XI.., XI) (63) 

For i = d + 1, ...,T, define 

Bl:^{Xl_d,Xl^d+i,-,Xl) (64) 

and for t^T+l,T+ 2,. .T+d 

Bl ■= {Xl_d, Xl_^_^_^, .., X^) (65) 

Then, it is easily seen that conditioned on A, B} and B^ 
are two conditionally independent Markov chains. Moreover, 
the distortion function pt{Xt^d, Xt) can be expressed as 
p{B} , Bf , A, Xt)- Thus, we have modified the problem to 
an instance of Problem PI or P2 with _BJ as the encoder ?'s 
observation. 

VL Conclusion 

We considered a real-time communication problem where 
two encoders make distinct partial observations of a discrete- 
time Markov source and communicate in real-time with a 
common receiver which needs to estimate some function of 
the state of the Markov source in real-time. We assumed a 
specific model for the source that arises in some applications 
of interest. In this model, the encoders' observations are con- 
ditionally independent Markov chains given an unobserved, 
time-invariant random variable. We formulated a communi- 
cation problem with separate noisy channels between each 
encoder and the receiver and a separated finite memory at the 
receiver. We obtained finite-dimensional sufficient statistics for 
the encoders in this problem. The structure of the source and 
the receiver played a critical role in obtaining these results. 

We then considered the communication problem over noise- 
less channels and perfect receiver memory. We developed 
a new methodology to identify structural results for this 
problem. The new approach highlights the importance of 
common information in decentralized team problems. We used 
the presence of common information between an encoder and 
the receiver to identify a sufficient statistic of the decoder that 
has a time-invariant domain. 

We have not addressed the problem of finding globally 
optimal real-time encoding and decoding strategies in this 
paper A sequential decomposition of the global optimization 
problem, for a special case of the problems formulated here, 
appears in [34]. 



Appendix I 
Proof of Lemma 1 

For a realization a;}.j of Xft, we have by definition, 

blia) =PiA = a\xl,,) 

a'eA 

(66) 



where we used Bayes' rule in ( |66| l. The numerator in ( |66| l can 
be written as, 

P{X} = xl\A = a, xit^,).PiA = a\xi,_,) 
^P{Xl = xl\A = a, x]_^).b\_M (67) 

where we used the Markov nature of X} when conditioned 
on A. Thus, for a fixed a, the numerator in (j66]l depends 
only on x\, x\_i and the previous belief h\_i. Since the same 
factorization holds for each term in the denominator, we have 
that 

h\ = a]{hU,XlXU), 
where a\ ,t = 2,?), ...,T are deterministic transformations. 

Appendix II 
Proof of Lemma 2 

By definition of pl\, we have 



^l\{m) = P{Ml_, = m\Zl,,_^,li,_^) 
P{lU{Ml^,Yl,) = m\Zl,_^,ll,_^) 



(68) 



With the memory update rules l\.t-i fixed, the probability 
in ( |68| l can be evaluated from the conditional distribution 
P{M}_^,Yl^\Zl^_^,ll,_,). For m' e A^i and y e y\ 
this conditional distribution is given as 



--P{Yl, = y|M/-2 - m',Zl,_„li,_^)y. 
P{Ml, = m'\Zl,_„li,_,) 
-^P{Yl, = y\ZU).P{Ml^ - 

.P{Yl,^y\ZU).^,U{m') 



(69) 



(70) 
(71) 



where we used the fact that the channel noise at time t (N^) is 
independent of the past noise variables and the Markov source 
in (70i. Thus, we only need Zf_^ and to form the joint 
belief in (69 1. Consequently, we can evaluate /ij (m) just from 
Zi_i and^4_]^. Thus, 

fxl^f3l{pU,Zl,) 

where Pl,t — 2, 3, T are deterministic transformations. 

Appendix III 
Proof Of Lemma 5 

For fixed f^.j, and for a given realization of the received 
symbols zl.^, zf.^ and the partial encoding functions wl.f, the 
receiver's belief on the state of the source at time t is given 
as: 

M^) ■■= P{Xt = x\zi„zl„wi„fl,) (72) 
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where x — {x^,x'^,a). Using Bayes' rule, we have 

M^) = P{Xt - X, zl, = zLl4t>L, fit) I 

J2 PiXt = x',Zl, = zl,\zl„wi,Jl,) (73) 
x'ex 

The numerator in right hand side of ( [73] l can be written as 

Pi^lt ^ zl.t\z\.t,Xt = x,w\.^Jlt) 

y^P{Xt=x\zit,wi^,fl,) 

=P{Zl,^zUxf^x',A^aJl,) 

X PiX^ = x'\A = a)x P{Xl = x\,A = a\z\,„wi^) 

(74) 

where we used conditional independence of Zf.f.,Xf and 
Zl.f.,X^ given A for the first term in (74i and the fact that 
Xt — {X^,Xf,A) in the second term of left hand side of 

Since the second encoder is fixed, the first term in the right 



hand side of (74 1 is a known statistic which depends on zf.^ 



The second term is again a known source statistic. Consider 
the last term in (74i. It can be expressed as follows: 

J P{Xl =x\,A^a, b\ - 
I [P{A = a\bl = b',xlzi„wi,) 



xP{Xl^xl,bl^b'\zl„wi,)] 



I b'{a) X P{Xl - x\M - b'\z\,,,wi,) 



b'eA(A) 



b'{a)x~eM,h') 



(75) 



(76) 



Similar representations also hold for each term in the denom- 
inator of (73 I. It follows then that with a fixed /^.j, ipt{x) 



depends only on the realization of second encoder's messages 
Z^.j and ^j. Thus, from (74i and (76 1, we conclude that ijjt 



can be evaluated from Q and Z^.j by means of deterministic 
transformations. We will call this overall transformation as St- 
Thus, we have 



(77) 



Since the estimate Xt is a function of ipt (cf. Theorem 3), we 
conclude that 

Xt^n{5t{et,zl,t)) 



Appendix IV 
Proof of Lemma 6 

1) Consider a realization zj.^ and lij.j. 
By definition, the realization of is given as 

II {x\ ;b\) = P{Xl^ x\ , b\ = bl I zlt, wit ) (78) 



Using Bayes' rule, we have 

ll{x\:b\) = P{Xl = x\,b\ = b],Zl = zl\zl.t_,,wlt) 
/E / PiXl=^'X = b',Zl = zl\zlt_„wit) 



6'eA(A) 

We can write the numerator as: 



(79) 



PiZl = zl\Xl = X\,b\ = b\, ZL-I, 

xP{Xl=x\,b\ = b\\z\,t^,,wit) 
= P{Zl = zl\Xl = x\,b\ = b\,wl) 

X P{Xl ^ x\Mt ^blVAt-lM:t) 



(80) 



the first term in (80 1 is true since z} = wl{x\, b\). The second 



term in dSOll can be further written as: 



E / P{X}^x]Mt^b\,XU^x" 

" V 1 



,A = a,b\_^ = b'\z\.t_-^,w\.t) 
J2 I [P{b] = b\\bl_,^b\Xl^x\,Xl,^x") 



xP{Xl ^x\\A = a,Xl_, = x") 
xP{A^a\bl_, = b',Xl,^x'\ 

4:t-l>Lt-l) 

X P{Xl, = x",bU = b'\zlt^„wlt-i)] 

(81) 

[P{bl = bl\bl_, = b',Xl = x\,XU - ^") 

xP{Xl ^x\\A^a,Xl_,^x") 

X P{A = a\bl_, - b') X e7_i(x", b')] (82) 



where we used Lemma 1 and the Markov property of Xf 



E 



given A in (81 1. The first term in (82i is simply 1 or since 
b] is a deterministic function of b^Zi, X} and Xl_^. The 
second term is a known source statistic and the third term is 



b'{a). Similar expressions hold for the denominator in (79i. 
Thus, from (79i-(82i, we conclude that to evaluate £^]{xUbj) 
we only need Z}~w} and This establishes equation (43 1. 



2) With encoder 2's strategy fixed, the expected instanta- 
neous cost from the coordinator's perspective is given as: 



E 



{ Pt{Xt, 



Xt) 



} 



[pt{Xl,XlA,Tt{6t{^lzlt))) I zl,t,wit,il }, (83) 



since ^} is a function of z^.j, Wi f, hence it can be included in 
the conditioning variables. Thus, the only random variables in 
the above expectation are Xl^X^^A and Z^.^ Therefore, the 
above expectation is a function of the following probability 
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mass function: 

=P{Zl^ = zlt, = x^lA = a, Xl = a;^ It ) 

[P(Xi = A = a, bl = b' \zl„ wl,„ ^D] 



[P{A^a\bl^b',zi„wi„il) 
b'eA(A) 

X P{Xl ^ x},bl ^ b'\zl,,wi„il)] 

=P{Zl, = zl,,Xl = xl\A = a)x J [b'{a) 

b'eA{A) 
b'\zl„wl,,il)] 

xj\A = a)x I [b'{a) 
b'eA{A) 



(84) 



X P{Xl 
'-PiZlt = 



^~eM.b')] 



(85) 



where we used conditional independence of the encoder's 
observations and actions given A in ( [84| . In ( [85| ), the first 
term is a fixed statistic when encoder 2's strategy is fixed and 
the second term depends only on . Thus, the expectation in 
(83 I can be evaluated using ^j. This establishes the second 



part of the Lemma (equation 44 



Appendix V 
Proof of Theorem 4 

For ^ e A(A'^ X A(^)), define the following functions: 
Vt{0 ^MO (86) 

inf [E { Vtijti^, Zlw])) I et-i = L ^w)] 

(87) 

for t = T,T- 1, ... ,2 and 

Fo = inf[E { Fi(7t(C, Zlw\)) \w\^w }] (88) 

The functions 'p^ and 74 are from Lemma 6. Note that the 
infimum in (87i is over all functions from the space {X^ x 
A(y^)) to the space and the infimum in (jssjl is over all 
functions from the space A"^ to the space Z^. 

Consider an arbitrary selection rule A :— (At, t — 1, 2..., T) 
for the coordinator That is, the coordinator selects the partial- 
encoding function at time t as follows: 



-1) 



(89) 



Then the coordinator's expected cost to go from time t 
onwards under the selection rule A is given as : 



Jt{Zl,^,w\,^)^'E\ Y^pt,{Xk,Xk) 



k=t 



(90) 



for t = l,2,...,r. Also, the overall expected cost under 
selection rule A is 

Jo = E{ Ji(Zi,w;i) } (91) 

We will show for all i = T, T - 1, , 1, we have the 

following inequality 



(92) 



where the is the belief on X^, bl conditioned on luj.^. 
We proceed by backward induction. At time T, we have 



Jt{ZIt, wIj.) = E I pt{Xt, Xt) 
=Pt(Ct) = Vrier) 



7I 



W 



i:T } (93) 

(94) 



where we used part 2 of Lemma 6 (equation 44 1 in ( 94 1. Thus 



i — 1, for a realization 



is true for t = T. Assume that d92k is ti-ue for time t. At 



^l:t-l' 



we have 



'l:t-l! ^l:t-l 



Jt-i{z\.,t-i,w\,t-i) 

= e| Pk{Xk:Xk) 
I fe=t-l 

= e| pt_i(Xt_i,Xt_i) z\.^_^,w\.^^■y |- 



^l:t-l' ^l:t-l 



(95) 



(96) 



k fc=i 
=Pt-l(lt-l) + 

E|E|^Pfc(Xfe,lfe) 



>pt-i{et-i)+'E{vt{et) 1 4t-i>!:t-i } 



(97) 
(98) 



where we used part 2 of lemma 6 for the first term in ( 97 1 
and the induction hypothesis at time t for the second term in 
(|98l). We will focus on the second term in (|98l). We have 



E{^tte')hiVi,sVi} 

= E { Vt{et) I 4t-i>L-i, lt-i>l } (99) 
Note that we have included and wl in the conditioning 



of the right hand side of (99i since under the selection 
rule A, they are functions of the original conditioning terms 
z\.^_l,'w\.^_■^. Further, using Lemma 6 for in (99 1, we get 



E 

= E 



> inf E • 



= inf nVt{lt {il „w{Xlbl),w))\ zi 

W 

Wl,t-l:il-l,wl = W} 



t-li 



(100) 



We will now show that the right hand side in ( 100| l is same as 
the second term in (87i evaluated at . The expectation 
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in ( lOOi depends only on P{X},b}\zl. 



w). Now, 



-1; Ct-li 



w) 



^ J [P{Xlbl,Xl, ^x',A^ a, b\_, - b'\z\,,^,, 



J [P{bl\Xl,bl, = b',Xl,=x') 



X P(Xti|Xti_i, A = a) X P(yl = a|6j_i b') 
x^l,{x\b')] (101) 
where we used the Lemma 1 and Markov nature of 



X} when conditioned on A in right hand side of ( 101 
The right hand side of ( 101 1 depends only on ^t-i 



(and the known source statistics). Thus, the probability 
PiXt,bj\zlt_^,wlt_j,il_^,wl = w) is the same as the 
probability P{X^ ,bl\^l_-j^,wl — w), hence the expression in 
the right hand side of (|100[) is the same as 



inimVtMili,w{Xlbl),wml„wl - w}] 



which is the second term in (87i evaluated at Therefore, 



using (98 1, we get 

'^t-l(^l:t-l>'^l:t-l) 

inf [E { VtM^l^ZlwD) I ^l„wl ^w}] 

= Vt-iiil,) (102) 

This completes the induction argument. Thus, we have that 
under any selection rule for the coordinator 



(103) 



Taking expectations on both side of ( |103[ ) and using the 
definition of Vq, we get 

Va < Jo 

for any selection rule A for the coordinator. Now a selection 



rule found using equations ( 86 1 and ( 87 1 that at each step 
selects a w} based on such that it is at least as close to 
the infimum Vt as Jt will achieve a performance that is no 
worse than A. This establishes the Theorem. 
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